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(54) A chemically synthesised artificial promotor for high level expression of transgenes and a 
method for its synthesis 

(57) The invention relates to a chemically synthe- 
sised artificial promoter comprising a DNA sequence 
designed for the target level and pattern of gene expres- 
sion, by strategically putting together several signature 
sequences identified by sequence alignment and statis- 
tical analysis of a large database constructed for this 
purpose. 
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Description 
FIELD OF INVENTION 

[0001] The present invention relates to a chemically synthesised and theoretically designed promoter for high level 
r^~l°? . trans9enes different nanisms and a method for designing of the said promoter. The invention further 
^^^mSa^St. Pr0m ° ter t0 dem0nStrate hish (evel activit y as compared to the natural Cauliflower Mosaic 
[0002] The invention emphasises the development of an artificial DNA sequence on the basis of computational 
analys.s of various genes which express at high level in plants. The invention provides a new outlook in the field of 
^SSSSSS f n ^f^^y •*>«■* •* act in cis on genes. The present invention takes prS in 
..c.m,ng that DNA elements that function as efficient regulatory sequences in the cells of higher organisms can be 

asssr desired ,evei - ° n - basis - i*— - ~ 

BACKGROUND OF INVENTION 

II^Lm ♦ °L th ! main ob i e ? ives of P ,ant 9 eneti0 engineering is to develop transgenic plants with new character- 
n^TT- ' ndUde ' nSeCt resistence - ^stance. herbicide resistance, yield enhancement, stress 
TJ^Z rZ f tZ ATT' em f * eXpreSSi0n 01 industrial| y Suable proteins in economically profitable expression 
systems. take plants, etc. Many factors contribute to high level expression of genes which code for such desiredcharac- 

Z^^TrJ^r^^TTl*™^ 00 - and ^ transtationa. events. The abundance of any one 

transcnpt m a ceil d.rectly relates to transcnptional events, which in turn depends upon the strength of the promoter from 
which rt,s expressed. Thus, for the development of transgenic p.ants where high level of transgene exp^Sont to Z 
obtained, rt becomes absofotely indispensable that the transgene be expressed from a stron^pronSe Te^an^rS 
is stable, rt is translated efficiently and that the resultant protein is also stable in plant cell. Each of these steL syTer 
g.st.cally contributes to enhancing the level of expression of the product of the transgene 

ESJLiinJ-? *T e \ ""V* d6fined 35 3 P001 ° f Cfe " aCtin9 e,emen1s - in co-ordination with trans acting 

2iT ? , to ach,eve expression of the gene attached to it. A promoter provides an efficient docking site for 

onl^ZTL an i the re ' ated aCCeSS ° ry Pr ° teinS ' « W * h turn COntribu,e to *• transcription of the gene sSati 
X £££ r « US> ? mert,oned - P™**"* are hi 9h<y specialised DNA sequences which govern the time and 
efficiency of transcription. A promoter is classified as a constitutive promoter when it is operable almost equally at all 
^"* 9 r n or9anism -. ^ example, the CaMV 35S promoter. Other promoters are tissue specific or inducible. The 
strength of a promoter vanes depending on the frequency of initiation of transcriptional events. Depending on strenqth 
35 promoters can further be classified as strong or weak. epenaing on strengm, 

SSLoi a Different types <" P rom °ters are required in plant biotechnology, depending upon the target use. Constitutive 

^^^SSTSS^SZ T m08f USe,Ul t0 d6Ve,OP tranS9eniC P,antS for hi9h ' evel product ' on of commercially 
' f 9 6 ex P r ess,on 's also desirable in several situations for modifying metabolic pathways 
and for improving plants to withstand a variety of stress situations paniways 

m^Vh J!"^!? T 0 ? ma ! n,ydeal wi,h the identification of natural promoter elements in genes and their improve- 
*h£J1 I ,dentf,cat,on °» the CaMV promoter by Odell et al.. Nature 313: 810-81 2 (1985). who had 

shown the strength and constitutive nature of CaMV 35S promoter. Later, Jensen et al.. Nature 321 : 669-674 (1986) 
Jefferson etal., EMBOJ., 6: 390-3907 (1987), and Sander et a... Nucleic Acids Research. 4: 1543-1558 (1987). showed 
measurable levels of reporter gene mRNA expressed from 35S CaMV promoter in extracts prepared from leaves 

?i r n!n T 0< ^r 89 ?,™ P ' antS - The ° aMV 35S Pr0m0ter has been wide| y used ^ ^^ntists in the field 

tSSi g l ? f 19 '"^'" 9 MOre "' 61 a '- NatUfe (1985) 315:200-204described that the CaMV 35S promoter is tran- 
L' 6 ^ V S 9 fate aS evidenced ^ a te "- fo W in ^ease in transcription products as compared to the NOS 

nSSl ^ e < 5 n h Ce K ( L 98 f ) S^ 738 ; 743 ' Bevan - al - EMB ° J" 0985) 4:1921-1926. Morelli et a... Nature 
(1985) 315.200-204. and Shah et al.. Science (1986) 233:478-481 described that the 35S CaMV promoter is mcder- 

COnSt ; , T , ( y J jf" There, ° re - the CaMV35S haSb6 *r. * expressa number of foSgn 

S ^rlT 9 T P h ° de " 61 31 ' NatUrG 313: 810 " 812 < 1985 >' desCT ibed that initiation of transcription from tne 
riSr^S k ' S dependerrt ° n P roximal sequences, which included a TATA element, while the rate of transcription was 
firaTl l^T enC6S *** W6fe dis P ersed 300 bP of upstream DNA Simpson et al.. Nature 323 551-554 
on™ d6SC " bad 1018 re9 '° nas an enhancer region (sequences which activate transcription are termed enhancers). 
[0007] Subsequently, other workers tried to improve the CaMV promoter. Kay et al. Science 236: 1 299-1 302 (1 987) 

Sf ftS' ST. r ^ e9 ^ B • (2 , 5 ^ P L 0, 1,16 natUra " y 6X156,19 CaMV 35S P romoter and reported enhancement in its activity 
£U1 2Sr ° 10:263 " 272 ( 1988 >- ^ed the use of a part of the CaMV 35S promoter as an enhancer 

in the ncpabne synthase promoter. Mitsuhara. et al.. Plant Cell Physiol. 37 (1) : 49-59 (1996) compared many combi- 
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nations of different CaMV 35S promoter sequence elements. By increasing the number of repeats of the native 
enhancer element, they obtained enhanced expression of the reporter gene. Ni, et a!. t The Plant Journal 7(4):661-676 
(1995) combined portions of the naturally occurring oct opine and mannopine synthase promoters to develop an effi- 
cient chimeric promoter. Ellis, et al., EMBO 6:1 1-16 (1987), reported the use of a natural octopine synthase promoter 

5 fragment to enhance the activity of the maize (adh-l) gene. 

[0008] Other developments include identification of other natural promoter elements for expression of genes in 
plants. These include the use of the Figwort Mosaic Virus promoter for achieving enhanced expression per US patent 
5,378,619, Rubisco promoter as per US patent 4,962,028. chimeric CaMV enhanced mannopine synthase promoter as 
per US patent 5,106,739, enhanced CaMV 35S promoter as per US patent 5,322,938, and the glutamine synthetase 

io promoter for organ specif ic expression in plants as per US patent 5,391 ,725. 

[0009] As of now, attempts have been made to identify the naturally existing promoter sequences to be used as 
such or to exchange or rearrange parts of natural promoters so as to achieve a higher level of expression. However, in 
no case an attempt has been made to design an artificial promoter based on knowledge gained from computational 
analysis of various DNA sequences present upstream of the gene sequence, reported in the database. 

15 

SUMMARY OF THE INVENTION 

[0010] Some of the objectives of this invention are to design a synthetic promoter aimed at achieving the desired 
level of expression of the target genes in plant cells, but also in bacteria, yeast lower euckaryotic cells and animal cells, 
so to use such a promoter,! n combination with specific regulatory elements, to modify it appropriately so as to make it tis- 
sue specific, development stage specific, organ specific and or inducible by specific external environmental/ applied fac- 
tors, as well as, providing, a new approach for studying the complexity of the interaction between cis -acting elements 
and frans-acting factors. 

[001 1 ] The present invention relates to analysing the gene sequence database for designing promoters for achiev- 
es ing the desired level of expression of transgenes in different organisms and a method for synthesis of the designed pro- 
moter. The invention further relates to testing and demonstrating high level of activity of the synthetic promoter as 
compared to the natural CaMV 35S promoter. 

[001 2] As an example, the invention demonstrates the designing of an artificial DNA sequence on the basis of com- 
putational analysis of various genes which express at high level in plants. The invention provides a new look in the field 

30 of synthesizing designer / custom made transcriptional regulatory elements. The approach includes the identification of 
DNA sequences representing, minimal promoter (SEQ ID NO 2 ,3 and 5), conserved domain I and its sub domains a, 
b and c (SEQ ID NO 6), transcription start site context (SEQ ID NO.4), conserved domain II and its sub domains a, b, 
c and d (SEQ ID NO 7,8,9 and 10), conserved domain III (SEQ ID NO 1 1). domain between TATA and TS(SEQ ID NO 
12), ^untranslated leader (SEQ ID NO 13). translations initiation codon contexts(SEQ ID NO 14 and 15) that act cis 

35 on the gene and N-terminal amino acids (SEQ ID NO 16) that may give stability to proteins. An example of such a con- 
struct designed in this study is SEQ ID NO: 1. The present invention takes pride in claiming that DNA elements that 
function as efficient promoter regulatory sequences in a variety of tissues and in a wide spectrum of organisms can be 
designed on the basis of knowledge generated from computational biology and bioinformatics and synthesised. This 
invention shows that a biological active and efficiently functional promoter can be synthesised to express in even the 

40 most complex organisms. 

BRIEF DESCRIPTION OF THE ACCOMPNYING FIGURES 
[0013] 

45 

FIG. 1 Describes designing of overlapping oligos for synthesis of a double stranded DNA containing a promoter 

representing (SEQ ID NO. 1). 
FIG 2. Describes restriction sites in the synthetic promoter designed in this study (SEQ ID. NO. 1). 
FIG3. Shows primer for introduction of ATG context in synthetic promoter, (SEQ ID NOs. 17). 

50 

DETAILED DESCRIPTION OF THE INVENTION 

[0014] As an example, the invention provides a chemically synthesised promoter comprising a DNA sequence for 
high level expression of transgenes in different organisms, as exemplified by SEQ ID NO 1 and a method for the syn- 
55 thesis of the said promoter. 

[0015] The invention further provides a method for testing high-level gene expression in plants. 

[0016] In an embodiment of the invention, a chemically synthesised promoter can comprise of minimal domain (a) 

as depicted in SEQ ID NO. 2 (for high level expression of genes, i.e., strong promoter) or 3(for low level expression of 
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genes, i.e., weak promoter) and their derivatives comprising of variations as seen in Tables 1 and 2 respectively, func- 
tioning as TATA contexts in reference to artificial synthetic promoter falling between the positions -26 to -43 (The num- 
bering of nucleotides is such that +1 indicates the first nucleotide of the transcription start site). 
[001 7] In another embodiment of the invention, the chemically synthesised promoter further comprises SEQ ID No. 
4 and its derivatives comprising of variations as seen in Table 3 functioning as consensus sequences for a transcription 
start site in a artificial synthetic promoter falling between the positions -6 to + 1 . 

[001 8] In yet another embodiment of the invention, the chemically synthesised promoter further comprises minima! 
domain (b) as depicted in SEQ ID No.5 falling between positions -39 to —84 of a synthetic promoter. 
[0019] In another embodiment of the invention, the chemically synthesised promoter further comprises conserved 
domain I and its sub domains a, b and c as depicted in SEQ ID No. 6 falling between the positions -85 to — 130. 
[0020] In yet another embodiment of the invention, the chemically synthesised artificial promoter further comprises 
conserved domain II and its sub domains a, b, c and d as depicted in SEQ ID Nos. 7, 8, 9 and 10 falling between the 
positions -134 to -350. 

[0021] In yet another embodiment of the invention, the chemically synthesised artificial promoter further comprises 
conserved domain III as depicted in SEQ ID NO. 1 1 falling between the positions -209 to —230. 
[0022] In another embodiment of the invention, the chemically synthesised promoter further comprises SEQ ID No. 
12 functioning as typical sequences between the TATA sequence and transcription start site falling between the posi- 
tions +1 to —26. 

[0023] In yet another embodiment of the invention, the chemically synthesised artificial promoter further comprises 
SEQ ID No.13 functioning as a 5' untranslated leader, and its translational enhancer *CAA' type region falling between 
the positions +1 to +89. 

[0024] In another embodiment of the invention, the chemically synthesised artificial promoter further comprises 
SEQ ID NOs. 14 (for high level expression of genes, i.e., strong promoter) and 15 (for low level expression of genes, 
i.e., weak promoter) and their derivatives comprising of variations as seen in Tables 4 and 5 functioning as consensus 
sequences around ATG start codon falling between the positions +83 to +102. 

[0025] In yet another embodiment of the invention, the chemically synthesised artificial promoter further comprising 
SEQ ID NO. 1 6 and its derivatives falling between the positions AA1 to AA4 comprising of variation to the extent as seen 
in Table 6 where the said amino acids, as indicated at the first four positions, are required at the N-terminus for high 
level expression of a transgene in cells (AA1-AA4 indicates amino acid one through four of the protein). 
[0026] In another embodiment, the invention further provides a method for chemically synthesising a promoter for 
expressing genes at a high level in different organisms comprising: 

a) Classifying genes database into highly and lowly expressed genes based on their signature sequences around 
certain transcription/ translation regulatory points that determine expression of the target genes. 

b) Identifying conserved domains of the highly expressed genes as identified in step(a) in critical elements com- 
prising a minimal promoter, conserved domain I and its sub domains a, b and c, conserved domain II and its sub 
domains a, b, c and d, conserved domain III, region between transcription start and TATA site, 5' untranslated 
leader, translational initiation codon ATG contexts and N-terminal amino acids. 

c) Designing synthetic promoters by placing identified critical sequence elements as given in step (b) above in a co- 
ordinated manner as depicted, for example, in SEQ ID. NO 1 or its other combinations to achieve desired level of 
expression of a reporter or target gene. 

d) Carrying out synthesis of the promoter DNA as obtained in step (c) above by synthesising overlapping oligos, as 
exemplified as the promoter of SEQ ID NO. 1, assembling the said oligos into double stranded DNA as depicted in 
Fig 1 and cloning of the said promoter with a reporter gene, or a targeted gene selected for expression. 

[0027] In yet another embodiment, organisms for high level expression of targeted genes are selected from plants 
or different parts of plants, including leaves, stems, roots and storage tissues like potato tuber , also in different phyla 
including dicot plants belonging to widely different families and bacteria. 

[0028] In yet another embodiment, a method for transient expression of the targeted gene from the said promoter 
in a variety of different tissues and cells as well as stable expression in different parts of transgenic organisms is 
achieved. 

[0029] In yet another embodiment, the mode of expression may be constitutive with preferential expression in cer- 
tain tissues, like roots in this case, in transient or in stable transgenic organisms from the said artificial promoter. 
[0030] Another embodiment of the invention provides a method for testing the high level expression from the chem- 
ically synthesised promoter, following transient transformation of plant cells by polyethylene glycol (PEG) mediated 
transformation of plant protoplasts, as well as by biolistic mediated transformation of a variety of tissues followed by the 
reporter gene assay as compared to the expression from a natural CaMV 35S promoter. For the purpose of the present 
invention, enhanced expression meant several fold higher activity than that from natural CaMV 35S promoter. 
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[0031] In another embodiment of invention, the activity level of the promoter will depend on the host plant species 
or the type of explant used for the said purpose. 

[0032] In yet another embodiment, the test plants used as reference plants are whole tobacco plant, excised 
tobacco leaf, isolated tobacco leaf cells, cabbage stem and potato tuber. However, expression was also established in 

5 the bacterium Agrobacterium tumefaciens. 

[0033] Computational analysis was carried out using the software from PC-Gene and database release 18-0 from 
Oxford Molecular Biology Group. Switzerland. A plant database comprising entries from plant genes only was created 
from the database CDEM 46 IN. It had 13,393 nucleic acid sequences. Depending on resemblance to a putative motif 
in the TATA and ATG regions, identified by comparing homology among 36 known highly expressed genes in plants, the 

10 database was classified into 262 transcriptionally highly expressed genes. Conserved motifs around the TATA region 
(Tables 1 and 2), transcriptional start site (Table 3) and translation initiation codon ATG (Tables 4 and 5) were identified 
for highly (Tables 1 ,3 and 4) and lowly (Tables 2 and 5) expressed genes. The databases were then screened for pos- 
sible conserved domains in the promoter region and further upstream of the coding region (reading frame) of genes. 
The highly conserved motif sequences along with the relatively less conserved regions and their variations to the extent 

is seen in the Tables 1 to 5 gave characteristic component sequences that were assembled to develop an artificial pro- 
moter. The individual motif sequences, most highly conserved were identified as ID SEQ 2 to ID SEQ 16 and assem- 
bled to obtain the promoter regulatory sequence ID SEQ 1 . 

[0034] As seen from SEQ ID No. 1 , several characteristic domains and the extent of variation can be identified in 
different regions of promoters by statistical analysis of genes sequence data, as presented in Tables 1 to 5. These 
20 domains were viz; 

0 Minimal promoter region 

a) Minimal domain (a): TATA box. as seen in data compiled in Tables 1 (for highly expressed ) and 2 (for lowly 
25 expressed) genes. 

b) Minimal domain (b) 

ii) Domain I (sub domains a. b and c) 
30 iii) Domain II (sub domains a, b, c and d) 

iv) Region between minimal promoter and transcription initiation start site 

v) Domain III 

vi) 5* Untranslated leader region 

vii) Translation initiation codon context, as seen in data compiled in Tables 3 (for highly expressed) and 4 (for lowly 
35 expressed) genes. 

viii) N terminal amino acids, as seen in data compiled in Table 5. 

[0035] Though the above mentioned different regions are predicted to contribute synergistically in determining the 
high level activity of a promoter, but not all of them are essential for a lower level of activity of the promoter. Although 

40 this invention demonstrates that the individual motifs can be put together to assemble a functionally efficient promoter 
regulatory region, the variations in the occurrence of individual nucleotides at any given position as seen in Tables 1 to 
5, make it obvious that various combinations excluding some of these elements can be functional to different extents. 
[0036] A minimal promoter in eukaryotes is the DNA sequence proximal to transcription initiation site. It usually con- 
tains an initiator cis element typically located -30 nucleotide upstream of the transcription start site (Aso, et al., J. Biol. 

45 Chem. 269: 26575-26583, 1 994). The minimal promoter mainly consists of a sequence commonly called as TATA ele- 
ment. Modulation of the formation or stability of the initiation complex by frans-acting proteins that bind to distal cis ele- 
ment requires an intact TATA box (Horikoshi, et al., Ceil 54: 665-669, 1998). Zhu, et al., The plant cell 7: 1681-1689, 
1988 shown TATATTTAA as a functional TATA box for phenylalanine ammonia-lyase (PAL) promoter. In vitro studies 
conducted by Mukumoto, et al., Plant Mol. Biol. 23: 995-1003 showed TATATATA as the sequence required for plant 

so TATA box. Till date, it is not known H TATATATA can be used as the minimal promoter in plants for expression of trans- 
genes. Moreover, the minimal domain (a) used in this study and as depicted in SEQ ID No. 2 is different from those 
described in the earlier studies. All promoters in the database, as summarised in Table 1 have sequence motifs repre- 
senting ID SEQ 2 or its variants within statistically insignificant limits. Table 1 represents the characteristic feature of 
TATA in highly expressed genes and the variation in TATA region as noticed in different genes. The sequence domain 

55 as shown in SEQ ID No. 2 is T3(T/A)TNTCAC TATATATAG (where T 3 indicates TTT appears at that site and N indicates 
any one of the four nucleotides A.T.G or C can appear at that site) is referred to as minimal domain (a) with respect to 
artificial synthetic promoter in this study Our analysis of the database shows that the position of the sequence identified 
by us can vary from 40 to 28 nt upstream of the transection start site. The lowly expressing genes show, the TATA con- 
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sensus as NT3N4T2TATAN N N AT (SEQ ID No. 3) which differs significantly from that found in consensus SEQ ID No. 2, 
and identified by us as a characteristic sequence in highly expressed genes. Thus the selection of sequence of TATA 
consensus region and its distance from the transcription start site may determine the level of gene expression. Muku- 
moto. et a!., Plant Mo!. Biol. 23: 995-1003 (1993) and Keith and Chua EMBO J.; 5 : 2419-2425 (1986) deduced the role 

5 of the TATA element by experimental evaluation. Their results established the requirement of a sequence with certain 
critical nucleotide positions within the TATA element. Mutations at different positions were reported to reduce the activity 
of promoter considerably. An optimized TATA consensus sequence should be situated at a certain distance from the 
transcription initiation site for efficient initiation of transcription. A less than proper distance of the TATA element from 
transcription start site and a widely different variant TATA box sequence can reduce expression as shown by Zhu, et al., 

10 The Plant cell, 7:1681-1689(1995). Efficient recognition of the TATA element by TBP and TAF (TBP associating factors) 
regulatory factors determines the efficiency of transcription by RNA polymerase II. Our results identify a distinct 
sequence that can be employed to express genes in plants. 

[0037] Another distinct domain in a minimal promoter is minimal domain (b) as depicted in SEQ ID No. 5 and its 
position in the synthetic promoter is marked in SEQ ID No. 1. We identified a variety of conserved sequences like 

15 CCAAT, CCACT, CACAAT, CAACCT, CCCAAT in minimal domain (b). These can be represented as C(C/A) (C/A) 
(A/C)T to reflect the observed variation. These sequences are more likely present between positions -39 to -84 (i.e. 
upstream of the transcription initiation site taken as +1), but may be present further upstream, as far as — 150 as seen 
by the database analysis. These sequences were noticed in the database to be typically intervened by the presence of 
a TGACG box. CCAAT and CCACT have been previously identified, in the case of CaMV virus 35S promoter by Odell, 

20 et al., Nature. 313: 810-812 (1985) and in certain other plant promoters and are referred as CAT box. However, minimal 
domain (b) as identified by us is invariably different from that shown in earlier studies. The utilisation of these sequences 
in the context of constructing a synthetic promoter is an unkjue idea in the process of promoter designing, as used by 
us and claimed here. Further, determining their specific positions and the variation thereof in promoters by comparing 
different plant genes is also a unique approach in developing a synthetic promoter. We notice the following sequence 

25 and variants as minimal domain (b). 

5' CCACTTGACG CACAATTGAGCACAATACGCCACTTGACGCTACT 3' (SEQ ID No. 5) 

which may act as part of the minimal promoter, both in sense as well as the antisense direction. Functional activity of 
30 the sequence constructed by us by employing a mix of C(C/A) (C/A) (A/C) T and TGACG either in prokaryotes or in 
eukaryotes and especially in plant cells is a novel part of this invention. 
[0038] The conserved domain I is as given below: 

5' GCTTGTACGC TGTACGCTGAC GATAGATAGATA CACGTGCACGCGT 3^ 

35 

(c) fb) (a) 

(SEQ ID No. 6) 

40 

It is further classified into domains (a), (b) & (c). The accessory domain was determined as conserved between nucle- 
otides (rrt) -85 to -130 but was also present upstream up to — 200 nt in some of the plant genes. Accessory domain 
designed by us has repeat elements of certain sequences. This may provide multiple binding sites for the trans-acting 

45 transcriptional factors. This may leads to the formation of stable transcriptional complex and hence efficient transcrip- 
tion. In many promoters it is known that certain elements are present in multiple numbers, as in the case of EGFR pro- 
moter in mammalian cells, which has multiple GC box as shown by Johnson, et al., J. Biol. Chem 263: 5693-5699 
(1988). Also in the case of CaMV 35S promoter, Benfy, P.N and Chua, N-H. t Science 250: 959-966 (1990) reported the 
presence of multiple CAT box and GATA type of elements. 

so [0039] Domain I (a) somewhat resembles the but different from GC box reported by Menkens, et al. , TIBS 20: 506- 
510 (1995) and may play the role in kinetics of opening of the transcription bubble and keeping the minimal promoter in 
a most active form to enhance transcription reinitiation from the transcription complex at the minimal promoter as sug- 
gested by Yean and Gralla Nucl. Acids. Res. 24(14) : 2723-2729 (1996). The domain I (a) designed by us is duplicated 
and is different from any of the earlier reported sequence and was predicted theoretically on the basis of computational 

55 analysis, as a possible efficient domain. The number of copies that could contribute to enhancing expression could vary, 
though three copies were taken by us as an example to demonstrate the principle. 

[0040] Domain I (b) is also designed to be a trimer of the GATA type cis-acting element. The GATA elements are 
known to associate with the CaMV 35S promoter as shown by Odell, et al.. Nature, 313: 810-812 (1985). On the basis 
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of comoutational analysis, we predict this as a sequence that can be used in combination with other sequences to 
acS ^ S "level of transcription. The number of copies has been taken as three as an example, to demonstrate the 
principle and may be variable. 

f0041l Domain I (c) is yet another artificial dimeric combination of the GTACGC type of elements, noticed I by usas 
, cWonfy present in ihe region of -126 to -1 14 but less commonly present in ;J^^ * 

GTACGC tie of elements have been described as the U box by Plesse. et al.. (1997) Mol. Gen^ent. 254 258-26^ 
We have included two such elements in the promoter designed in this study, only as an example. The number of cop.es 
that contribute to improved function may be variable. . 
mm We predSt that the three types of domains i.e. a, b. c individually and their combinations m s<ngle or multiple 
w copies can act in co-ordination with each other either in the sense or in the antisense direction. On thebasis i of ou. -anal- 
yst the dataset developed by us. we predict that these can even be expected to work m other po^ble number of 
repeats, permutations and combinations. These domains were identified by us by theoretical analyse and used to 
dTsign a promoter region targetting at high level expression of genes. Hence, the designed sequence <s novel and does 
not resemble my natural promoter, as far as the sequence is concerned, and has no known example of a Similar pro- 
is moter reported in earlier studies. . ,„_,„, „ 

[00431 The regions identified during our analysis mainly comprise of tandem repeats of 2-8 nt length termed as 
domain 11(a). They are mainly spread from the -130 to the ^550 nt region. These repeats include purine nch elements. 
S have been identified for the first time in our analysis. These are (A/G) 2 . 8 (SEQ ID No. 7) or ,ts complementary 
(T/C) 2 . 8 nt. As noticed in the dataset of highly expressing genes created by us. these events are rnamly present 
beyond -200 nt but may be present between -200 to -1 50 nt and less commonly before -1 30 nt These may or may not 
have specific palindromic geometry. These types of elements may be separated by 2 to 200 nt from each other. The 
copy number of these elements may vary from 1-10 and less commonly may go up to 15. . . . 

[0044] Yet another sequence typical to the dataset of highly expressing genes and identified dunng this analysis s 
a C(AO) (A/T)C(A/T)(A/T) (SEQ ID No. 8) type of element termed as domain ll(b). These e,em ^ are 1 |^ n ^ 
present upstream of the promoter beyond -200 nt. but may less commonly be pres^be^een -200 to -150rrt and 
exceptionally may be located downstream of a gene. The location of these elements in the database suggests that 
these enhancer elements may act in the sense as well as in the antisense direction . 
[0045] Another conserved element includes the SV40 type of enhancer, the role of these has been estaWished in 
plant promoters, animal promote* and viral promoters. However, their usage in the form of an artt,aj 
moter has not been discussed or reported. Use of several such elements in such a way that functional co-o^nation is 
achieved in form of a synthesized promoter is a new concept. Furthermore, other variants of these sequences and 
those not reported earlier, like GGTAATAC (SEQ ID No. 9) termed as domain ll(c) have been emptoyed .n des.gn.ng the 
promoter These elements are usually present after - 200 nt upstream but less commonly occur before -1 30 nt. 
EES? Another 16 base pair palinXmic sequence. S ACGTAAGCGCTTACGT S^SEQ ID NoJO) a Mta octopme 
enhancer type of element and its variants, which may or may not be palindromic. These were rtmtahed dunng , this 
study to be conserved in several highly expressed plant genes and termed as domain ll(d). Th<s element was ^cated 
more usually around -200 bp upstream. It may be active in both sense and antisense directions. The activity of the nat- 
ural Sterne* was shown by Gelvin. et a. . Proc. Natl. Acad. Sci. USA. 85 : 2553-557. (1988). However rts use .n 
association with other elements to develop a synthetic promoter is a novel aspect of this invention. 
[0047] DNA bending elements have been suggested to play an important rote in bringing synergy between ) a basal 
promoter and the upstream activating region in animal cells. We have for the first time identified a potential DNA bend- 
ing element in the highly expressed genes in plants i.e. 

5' CGATCTGACCATCTCTAGATCG-3' (SEQ ID No. 1 1) 

This element is termed as domain III with respect to synthetic promoter. This site *™V™ n f?*™^° ^ ™ 
ment identHied in animal promoters as shown by Kim & Shipro. Nucl. Acids Res.. 24: 4341 -4348 O^-Jhe a^ 
reported the potential of this element in animal cells to activate basal promoter. This elements is mostly present ^be- 
tween basal promoter and the upstream activator elements. This sequence is usually found upstream of -200. Identifi- 
cation of these elements in plants as well as the use of these elements in developing a synthetic promoter ,s a novel 

[0048] th ' S Tte Son between the transcription start site and the TATA box is also highly conserved and was identified 
by comparing several highly expressed genes. This region, viz.. 

55 5" GGAGGTTCATTTCATTTGGATTGGACA3' (SEQ ID No. 12) 

has not been identified earlier. It does not exactly resemble any known promoter and was computed purely by arming 
the highly expressing genes and comparing the sequences with lowly expressed genes. Its d.stance var.es between 20- 
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40 nucleotides but usually is around 26 bp. This DNA sequence may function by lowering the Tm, and hence is pre- 
dicted to facilitate transcription bubble formation and increase transcription efficiency. To that extent, the use of this ele- 
ment as well as its variants with lower Tm (AT richness) is a part of the new principle employed by us in developing an 
artificial promoter. 

5 [0049] The 5' untranslated leader region also modulates the level of expression, as per the scanning model pro- 
posed by Kozak Ceil. 22: 7-8 (1980). The 40S subunit binds to the 5* cap end of eucaryotic mRNA. The efficiency of 
initiation of translation depends upon smooth scanning by a ribosome and efficient recognition of the AUG context to 
form a translations complex. Any strong hairpin formation in this region can adversely effect the ribosome scanning and 
reduce translations efficiency We have analysed the sequences in the untranslated leader sequence (5* UL) of plant 

10 genes and discovered that the 5' UL of highly expressed genes more often varies from 75 to 90 nt while that of the lowly 
expressed genes showed relatively longer 5' UL ranging from 100-300 nt. and is sometimes intervened by an intron. 
We have identified 'CAA* type conserved sequences in the 5' untranslated leader region. The frequency of occurrence 
of CAA in highly expressed genes in a representative data employed by us was 3.6 elements, while that in the lowly 
expressed genes was 1.1 elements per 100 nucleotides of the leader sequence. The CAA sequences have been rec- 

is ognised as translation^ enhancers in TMV by Gallie and Waubot Nucl. Acids. Res. 20: 4361-4368 (1992), but their 
association with plant genes has not been reported earlier. The 5' UL used in this study is 81 nucleotides long. Care 
was taken to avoid *G' in the 5' UL since our data suggest poor representation of *G' in the 5* UL of highly expressed 
genes. According to the analysis, the artificial 5* UL was constructed for efficient scanning as per SEQ ID No 13. 
[0050] We also compared the translation initiation codon AUG context (that determines the ribosome halting at 

20 AUG and initiation complex formation) among highly and lowly expressed genes. Improper context leads to bypassing 
of AUG by rtoosomes, as shown by Kozak J. Mol. Biol, 196 : 947-950 ( 1987). We identified different contexts in different 
groups of plant genes which show significant differences in expression. The highly expressed genes show 
AT(A/C)AACA£IQGCTNCCNCNA (SEQ ID No. 14) in contrast to the lowly expressed genes in plants which show 
GAN ATGNGN NG N NANA (SEQ ID No. 1 5) (Tables 4 & 5). SEQ 1 D No. 15 (although does not contain G after ATG). This 

25 indicated that the differences in the AUG context may be critical to achieve the desired level of gene expression. Anal- 
ysis of the highly expressed genes, as seen in Table 4 suggests that the former sequence and its close variants allow 
high level expression of genes in nature. Hence, an artificial promoter targeted for high level of gene expression can 
have SEQ ID No. 1 4 or its variants to the extent given in Table 4. 

[0051] A significant new finding emerges from the analysis of the first four codons in highly expressed genes in 

30 plants. As summarised in Table (6), the first four codons in highly expressed genes predominantly code for specific 
amino acids that may to stabilise proteins. The first triplet is always methionine, as known already. The second triplet 
predominantly codes for alanine, the third and fourth triplets code predominantly for serine. The predominant presence 
of methionine, alanine and serine at the N-terminus may confer stability to highly expressed proteins by enhancing their 
half life. This can facilitate their abundance. Our results suggested that following methionine and alanine at the first and 

35 the second positions, respectively, serine is the predominant amino acid at the third and often at the fourth position in 
highly expressed genes in plant cells. The use of DNA codons for these amino acids at the N- terminus in order to 
achieve high level expression of genes or high stability of the proteins is a novel finding of this invention. 
[0052] The aforesaid information generated through the computational analysis was used to design a synthetic pro- 
moter targeted for high level expression of genes. The sequence of the promoter so designed doesn't resemble any of 

40 the natural promoters. The basis of the invention is to develop database with a subset of genes that express under a 
desired condition, identify the pool of cis-acting elements common to these genes and bring such elements together in 
a systematic way so as to achieve to desired level and pattern of transgene expression. The present study demon- 
strates the basis of promoter designing by targeting to develop a highly expressing constitutive promoter. The distances 
between the several c/s_elements can be variable within limits but do not match any known promoter. The sequence of 

45 an exemplary promoter is as per SEQ ID No1. Several natural promoters, like CaMV 35S promoter have been shown 
to function in unrelated organisms, like the yeast Schizosacchromyces pombe by Gmunder and Kohli, Mol. Gen. 
Genet.220(1): 95-101 (1989) and animal tissue, like, Xenopus oocytes by Ballas et al.. Nucl. Acids. Res 17(19):7891- 
7903. Several of the bacterial promoters have been reported to express in plant chloroplasts and vice versa, as in 
Brixey.et al., Biotechnology Letters 19:395-399 (1997) and Daniell, et al., Nature Biotechnology 16:345-348 (1998). The 

so structural and functional conservation of several components of the transcriptional machinery in plants, animals and 
yeast i.e. in all eukaryotes has been reported by Gasch et al..Nature,346:390-394 (1990) and Vogel et al., Plant Cell 
5:1627-1638(1993). Therefore, the said artificial promoter designed and synthesised as described by us can be used 
to express foreign genes in plants, animals, bacteria and other lower organisms. Our prediction that such an artificial 
synthetic promoter will be expected to express in several eukaryotes is therefore, logical. As shown by Odeil et al., 

55 Nature 313:810-812 (1985), a strong promoter like the CaMV 35S promoter expresses in ail parts of plants, like the 
stems, leaves, roots and flowers. The examples given herein demonstrate that the promoter designed by us as an 
example, for high level expression of genes, expresses efficiently in protoplasts; all parts of plants viz., leaves, stems 
and . roots; in different plant species, like tobacco, cabbage stem and potato tuber and also in bacterial cells. 
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[0053] As given in figure 1 . the artificially designed promoter sequence was divided into 1 6 overlapping oligonucle- 
otides each of around -50 nucleotides in length, for the purpose of synthesising the promoter chemically. Unique Sail 
and Xbal restriction enzyme sites were provided at 5' and 3 1 ends respectively to facilitate cloning. As seen in figure 2, 
other sites were also created inside the designed promoter sequence to facilitate future studies on various elements. 
The individual oligos were synthesised on a Pharmacia 1KB Gene Assembler Serial 1 . These were purified using 10% 
denaturing PAGE and eluted in MilliQ water from polyacrylamide gel. Finally, the desalting was carried out using NAP 
10 (Pharmacia) column. The assembly of the oligomers was earned out using the method described by Singh et al.. J. 
Bioscience. 21 (6) :735-741 (1996). The assembled product was then cloned into the MCS of the SK+ Wuescnpt plas- 
mid vector (Stratagene). 

[0054] All molecular biology protocols were followed as taught in Manual on Molecular Cloning (second edition) by 
Sambrook. Fritsch and Maniatis. The clones were sequenced using an Applied Biosystem DNA sequencer. The pnmer 
was designed for introduction of context in front of the uidA gene. The sequence of the primer is shown in figure ( 3 ). 
Xbal site was given at the end of primer. The downstream primer was designed from the Mfel site located about 1 50 bp 
in the uidA gene . The 150 bp fragment including the 5' end of uidA was then amplified using pB1 1 01 .1 (Clonethch) as 
a template. The -150 bp fragment so obtained for each different context was then excised from agarose gel and blunt 
end ligated to EcoRV cut SK+ bluescript plasmid. The clones were then sequenced using an automated sequencer as 
mentioned earlier. 

[0055] The 2 3 kbp Mfe I - Eco Rl fragment of pB1 101 containing the uidA gene (downstram of Mfel site) with nos 
terminator was purified from agarose gel. This was ligated to the Xbal-Mfel -150 bp fragment representing the context. 
The context-u/dA constructs were cloned into PUC 19 cut with Xbal and EcoRI. Positive clones were selected on 
blue/white basis and confirmed by cuting with the internal sites. The Sall-Xbal fragment (434 bp) of the artificially 
designed synthetic promoter (ASP) so excised from the gel was ligated in front of each of these clones. Constructs with 
the synthetic promoter in front of uidA gene with context was named as pASP. Comparison was earned out between 
the synthetic and CaMV 35S promoters using biolistic and PEG mediated DNA delivery into leaf cells and protoplasts 
of tobacco, cabbage and potato tuber. Transient expression was measured on the basis of GUS expression using 
known techniques. The efficiency of synthetic promoter for expression in different parts of tobacco plant was also meas- 
ured by developing stable transgenic plants of tobacco, following transformation by Agrobactenum tumefaciens. 
[0056] The details of the process of the present invention are given below and illustrated with the help of examples 
but should not be construed to limit the scope of the invention: 

EXAMPLE 1 

Transient expression of synthetic promoter by PEG mediated transformation of tobacco protoplasts. 

35 [0057] Protoplasts were prepared by digesting fully expanded leaves of Nicotiana tabacum in enzyme mixture con- 
taining 0 625% Cellulase R 250 and 0.625% Macrozyme R 250 in K3A nutrient medium (Negritiu et al.. Plant Mol. Biol. 
8 363-373(1987). Protoplasts were isolated by the floating method as per Negritiu et al ibid 10 protoplasts were sus- 
pended into 0.3 ml PTN (Negritiu. etal.. Plant Mol. Biol. 8: 363-373 (1987) solution. 50 ug of the DNA construct (cany- 
ing the artificial promoter with uidA gene) was then added immediately, followed by addition of 24% PEG (8000) to the 
final concentration of 10%. Equal volume of the K3A medium was added after 20 min of incubation. After 10 mm. the 
total volume was made to 3.0 ml with K3A medium. The protoplasts were incubated at 28°C for 24 h in the dark. After 
24 h protoplasts were pelleted down and washed with W 5 salt solution (Negritiu. et al.. Plant Mol. Biol. 8: 363-373 
(1987) Finally protoplasts were suspended in GUS extraction buffer and lysed by sonication. Expression of GUS from 
uidA gene attached to the coiresponding promoter was examined on the basis of hydrolysis of a fluorescent substrate 
called MUG i.e.. 4-methyl umberHerryl gluconoride as described in Jefferson Plant. Mol. Biol. Reporter 5 (4) : 387-405 
(1987). The results of the expression are given in Table 7. The synthetic promoter expresses in tobacco protoplasts at 
levels three to four times higher than the native 35S CaMV promoter. 

EXAMPLE 2 

Comparison of 35S CaMV promoter with synthetic promoter using biolistics mediated delivery in tobacco leaf. 

[0058] The microprojectile mediated delivery of DNA containing the transgene (reporter gene i.e. uidA) driven by 
CaMV 35S or the synthetic promoter sequence (described in the present invention) was achieved in tobacco leaf, using 
a helium gas driven biolistics gun. The DNA was coated on gold particles of 1um size by mixing 3 mg particles (sus- 
pended in water) with 5ug DNA constructs (suspension in 5 U I water). S0»il calcium chloride (2.5 M stock solution) and 
20nl spermidine solution (0.1 M stock solution). The mixture was allowed to shake for 3 min and centnfuged briefly for 
30 sec The pellet was suspended well in 250pl ethanol and centrifuged again briefly for 10 sec. The pellet was again 
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resuspended in 60p! ethanoi. Such DNA coated particles were then bombarded on leaf discs of Nicotiana tabacum 
placed on MS agar medium, using a PDS 1000 He machine (Biorad Laboratory, USA). The plates were incubated 
under controlled light and temperature for a period of 46 h. The GUS assay was carried out as per Jefferson Plant Moi. 
Bio!. Rep. (4): 387-405 (1987) The results given in Table 8 clearly demonstrate that the synthetic promoter causes 
5 expression of the uidA gene at a sixteen fo!d higher level in tobacco leaves as compared to the native 35S CaMV pro- 
moter. 

Example 3 

10 Comparison of transient expression from CaMV 35S promoter with that from synthetic promoter in different 
plant species. 

[0059] To examine the expression of the synthetic promoter in a variety of plant species and in different explants, 
the DNA was delivered by biolistics method as in Example 2. Cotton leaves, potato tubers and cabbage stem were 

is selected for expression of the synthetic promoter vis a vis CaMV 35S promoter. Transient transformation was carried 
out using biolistics as described in example 2. Following the bombardment, the GUS assay was carried out using MUG 
substrate as per Jefferson, RA. Plant Mol. Biol. Rep. (4): 387-405 (1987). The results complied in Table 9, demonstrate 
that the synthetic promoter expressed at substantially higher level in different plant species and in a variety of explants. 
Thus the synthetic promoter designed by us expresses in a species and tissue independent manner at levels 2 to 20 

20 times higher than the CaMV 35S promoter, following transient transformation. 

Example 4 

Expression of the designed synthetic promoter in different plant parts in stably transformed tobacco plants. 

25 

[0060] A plant expression cassette was constructed by replacing Sal-EcoRI fragment of pB1 101 .1 (Clontech) with 
the synthetic promoter-u/cW-Nos cassette. The vector was inserted into a commonly used Agrobacterium tumefaciens 
strain LBA 4404 containing helper plasmid pAL4404, by electroporation. (Jun, SW and Forde, BG Nucl. Acids. Res. 1 7: 
8385 (1987). Transgenic tobacco (Nicotiana tabaccum cv petit havana) were developed by cocultivation of tobacco leaf 

30 discs with Agrobacterium tumefaciens strain LBA 4404 (pAL 4404: pBIASP) for 48 hrs in dark. The cocultivation was 
performed on commonly used agar solidified MS (Murahsige and Skoog 1962) medium. Leaf discs were transferred to 
regeneration medium (MS medium + 1 .0 mg/L Benzyl amino purine + 0.1 mg/L napthalene acetic acid) supplemented 
with 250 mg/L cefotaxime (to inhibit bacterial growth) and 100 mg/L kanamycin to select the transformed celts. The 
selection was performed for 4 weeks in 60 umol m^s' 1 PAR (16 h photoperiod) and 24±2°C in culture room. The shoots 

35 regenerated in the presence of kanamycin were excised and transferred to rooting medium (MS + 0.1 mg/L naphtha- 
lene acetic acid + 50 ug fL kanamycin). The shoots with well developed roots were obtained after 2-4 weeks culture 
under 60 umol m' 3 s" 1 PAR and 24±2°C temperature. Two different transgenics in vitro regenerated plantlets at 4-6 leaf 
stage were sacrificed to check the activity of synthetic promoter in leaf, stem & root. The expression of uidA gene was 
checked by GUS assay using MUG as described by Jefferson, RA. Plant Mol. Biol. Rep. 4: 387-405 (1987). The results 

40 are compiled in Table 1 0. The syntheitc promoter shows a high level of activity in leaf, stem as well as root of Ro tobacco 
plants. However, quite noticeably, the activity in roots was at least five times higher in transgenic plantlets. Thus the syn- 
thetic promoter expresses at high level constitutively but has a preference for high expression in roots. 

Example 5 

45 

Expression of the designed synthetic promoter in the bacterium Agrobacterium tumefaciens. 

[0061] To demonstrate that the synthetic promoter expressed in prokaryotes, the bacterium Agrobacterium 
tumefaciens was taken as an example. The construct pBIASP expressing uidA synthetic promoter and pBI121 (Clon- 

50 tech) from CaMV 35S promoter were transformed into Agrobacterium using electroporation as described by Jun, SW 
and Forde, BG Nucl. Acids Res 1 7: 8385 (1 989). The freshly transformed cells were then grown overnight in LB medium 
supplemented with kanamycin (50 jig/ml) at 28°C on shaker (250 rpm). The cells were then harvested using centrrfu- 
gation at 12,000 rpm for 5 min. The cells were then suspended in GUS extraction buffer and lysed by sonication. Debris 
was then pelleted at 12.000 rpm for 10 min at 4°C. Supernatant was used for GUS assay as described by the Jefferson. 

55 RA Plant. Mol. Biol Rep 4: 387-405 (1987). The results in Table 10 demonstrate activity of the synthetic promoter in the 
bacterium. The synthetic promoter showed 10 fold higher activity as compared to the CaMV 35S promoter. 
[0062] Although the foregoing invention has been described in some detail by way of illustrations and examples for 
the purposes of clarity of understanding, it is obvious that certain changes and modifications may be practised within 
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the scope of the variation in context sequences noticed in the statistical analysis given in Tables 1 to 6 and appended 
in the claims. 

INFORMATION FOR SEQ ID NO. :1 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 452 base pair 

(B) TYPE: DNA 

(C) STRANDEDNESS : SINGLE OR DOUBLE 

(D) TOPOLOGY: LINER OR CIRCULAR 

(II) MOLECULE TYPE: ARTIFICIAL 

(III) SEQUENCE DESCRIPTION: ARTIFICIAL SYNTHETIC PROMOTE 
- 3 50 

T 

1 GTCGACCATCATTTGAAAGGGCCTCGGTAATACCATTGTGGAAAAAGTTG 

DOMAIN II 

5 1 GTAATACGGAAAAAGAAGATTCATCATCCAGAAAAGGTGTGGAAAAGTTG 
20 -230 -209 

X DOMAIN III X 

101 TGGATTGCGTGG AAAAAGTTCGATCTG ACC ATCTCTAG ATCGTGG AAAAA 
DOMAIN II 

25 151 GTTCACGTAAGCGCTTACGTACATATGTGGATTGTGGAAAAAGAAGACGG 

-130 

X DOMAIN I 

201 AGGC ATCGGTGG AAAAAG AAGCTTGTACGCTGTACGCTGACGATAG ATAG 
-84 

1 MINIMAL DOMAIN (b) 

251 ATAC ACGTGC ACGCGTCCACTTG ACGCAC AATTG ACGCACAATGACGCC A 
-43 -26 

ImINIMAL DOMAIN (a) X P.ECION 3ETWEE TATA AND TS 
3 01 CTTGACGCTACTTCACTATATATAGGAAGTTCATTTCATTTGGAATGG AC 
-ITS 
I 

3 51 ACGTGTTGTCATTTCTCAACAATTACCAACAACAACAAACAACAAACAAC 
5' UNTRANSLATED LEADER +89 

X 

401 ATTATACAATTACTATTTACAATTAC ATCTAGATAAACAATGGCTTCCTCC 
^102 
X 
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Table 3 



Analysis of sequences around transcription start site in highly expressed 

genes in plants 


Position 


-10 


-9 


-8 


-7 


-6 


-5 


-4 


-3 


-2 


-1 


+1 


























A(%) 


23 


21 


27 


22 


40 


30 


26 


32 


34 


23 


62 


























T(%) 


35 


35 


35 


35 


27 


26 


31 


37 


28 


25 


28 


























G(%) 


16 


25 


11 


15 


11 


10 


10 


6 


11 


12 


4 


























C (%) 


26 


19 


27 


28 


22 


34 


33 


25 


27 


40 


6 


























* 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


A 




























N 


N 


N 


N 


A 


N 


N 


N 


N 


C 


A 



* Consensus as per Cavener 

** Consensus as per x 2 test (at P^O.05 % occurrence to be £36) 
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Tab!e6 



Percentage occurrence of amino acids at the first five posi- 
tions coded by the highly expressed plant genes 


Amino Acid 


AA 1 


AA2 


AA 3 


AA 4 












Lysine 


0 


o 


4 


13 


Aspargine 


0 


o 


2 


7 


Serine 


0 


1 


28 


30 


Glutamic acid 


0 


o 


4 


3 


Isoleucine 


o 


o 


2 


4 


Aroenine 

» ** Vj^l III lw 


o 


o 


2 


4 


Threonine 


o 


o 


10 


5 


Alanine 

' VI t4l 111 i\Jt 


o 




1 o 


7 


Asoartir acid 


o 


n 
yj 


7 




Glycine 


o 


1 


2 


2 


Valine 


o 


1 
i 


•a 

o 




Glutamine 


0 


0 


1 


4 


Histidine 


0 


0 


1 


2 


Tyrosine 


0 


0 


3 


0 


Proline 


0 


0 


1 


0 


Leucine 


0 


0 


13 


7 


Phenyl alanine 


0 


0 


0 


3 


Cystine 


0 


0 


2 


1 


Methionine 


100 


0 


2 


1 ! 



Table 7 



Functional comparison of CaMV 35S promoter with 
that of artificial synthetic promoter using PEG medi- 
ated tobacco protoplast expression system 


PROMOTER 


MUG ASSAY (pmole/h/ 
mg protein) 


ARTIFICIAL 


2000 


35S 


550 
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Table 8 



Functional comparison of CaMV 35S promoter with that of artificial synthetic promoter using bio!- 
istic mediated DNA delivery in leaf tissue of tobacco 


PROMOTER 


NUMBER OF BLUE 
SPOTS 


SIZE OF BLUE SPOTS 


MUG ASSAY (pmoleih/ 
mg protein) 


ARTIFICIAL 


+++++++ 


++++ 


7380 


35S 


++ 


++ 


443 



Table 9 



Comparison of transient expression from CaMV 35S promoter with that 
from the designed synthetic promoter in different plant species. 


Plants 


Activity pmole of MU/h/mg protein 




CaMV 35S promoter ! 


Artificial synthetic pro- 
moter 


1) Tobacco (leaves) 


443 


7380 | 


2) Cotton (leaves) 


376 


5640 


3) Potato (tuber) 


2867 


4166 


4) Cabbage (stem) 


3657 


3983 



Table 10 



Expression of the designed synthetic promoter in differ- 
ent plant parts in stably transformed tobacco plants. 


Plant Part 


Activity pmole of MU/h/mg protein 




Transgenic line 1 


Transgenic line 2 


Leaf 


29400 


35300 


Stem 


35362 


27750 


Root 


104412 


136537 



Table 1 1 



Expression of the designed synthetic promoter in 
the bacterium Agrobacterium tumefaciens 
Activity pmole of MU/h/mg protein 


CaMV 35S promoter 


Artificial synthetic pro- 
moter 


2.3x1 0 4 


26x1 0 4 
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Claims 

1 . A chemically synthesised artificial promoter comprising a DNA sequence designed for the targeted level and pat- 
tern of gene expression, by strategically putting together several signature sequences identified by sequence align- 
ment and statistical analysis of a large database constructed for this purpose. 

2. A chemically synthesised promoter as claimed in claim 1 comprising the 'minimal sequences' SEQ ID No. 2 for high 
level of expression and SEQ ID No. 3 for low level of expression and falling upstream of transcription initiation site 
as between the positions -26 to -43 and their derivatives as identified by the specified alternatives (T/A) or any of 
the four nucleotides at positions designated as N on the basis of statistical analysis of the database constructed for 
this purpose. 

3. A chemically synthesised promoter for high level expression of genes, as claimed in ciaim 1 further comprises SEQ 
ID No. 4 around transcription initiation site at +1 and its derivatives designated as A/C and N and falling between 
the positions -6 to +1 . 

4. A chemically synthesised promoter as claimed in claim 1 further comprises SEQ ID No.5 falling upstream of the 
SEQ ID No. 2 and 3 claimed in claim 2, between the position -39 to -84 for enhancing the level of expression from 
the minimal sequence as claimed in claim 2. 

5. A chemically synthesised promoter as claimed in claim 1 further comprises the conserved domain I and its sub 
domains a. b and c as depicted in SEQ ID No.6 falling upstream of SEQ ID NO.5 claimed in claim 4, between the 
positions -85 to -130 for the purpose of further enhancing the level of expression from the minimal sequence as 
claimed in claim 2. 

6. A chemically synthesised promoter as claimed in claim 1 further comprises conserved domain II and its sub 
domains a. b, c and d as depicted in SEQ ID No. 7, 8, 9 and 10 falling upstream or downstream of SEQ ID NO. 5 
claimed in claim 4, as between the position -134 to -350 and their derivatives designated as alternative choices at 
individual positions as specified in the SEQ ID No. 7 to 10 and contributing individually to enhance the level of 
expression from the minimal sequence claimed in claim 2. 

7. A chemically synthesised promoter as claimed in claim 1 further comprises conserved domain HI as depicted in 
SEQ ID NO. 1 1 falling upstream of SEQ ID NO. 6 claimed in claim 5, as between the position -209 to -230 and 
required to further enhance the level of expression from the minimal promoter sequence claimed in claim 2. 

8. A chemically synthesised promoter as claimed in claim 1 further comprises SEQ ID No. 12 falling between SEQ ID 
NO. 4 and SEQ ID NO. 3 at positions +1 to -26. 

9. A chemically synthesised promoter as claimed in claim 1 further comprises SEQ ID No. 13 falling between tran- 
scription start site and the A of the translation start codon ATG at positions +1 to +69. 

10. A chemically synthesised promoter as claimed in claim 1 further comprises SEQ ID No. 14 and 15 falling in the 
region of translation initiation codon ATG, as between the positions +83 to +102 and their derivatives designated 
as alternate nucleotide or N and functioning as consensus sequences for ATG start codon 

11. A chemically synthesised promoter as claimed in claim 1 further comprises SEQ ID No. 16 coding for amino acids 
AA1 to AA4, where the said amino acids at the first four N-terminus positions are methionine-alanine-serine-serine 
for high level expression of the encoded protein. 

12. A chemically synthesised promoter as claimed in claim 1 that comprises of a minimal SEQ ID No. 2 or 3 as claimed 
in claim 2 and all other sequences claimed under claims 3,4,5,6,7,8,9,10 and 1 1 that contribute by enhancing the 
level of expression from the minimal promoter. 

13. A method for chemically synthesising a promoter for expressing genes at a desired level in different organisms 
which comprises: 

a) Classifying DNA sequence database into highly and lowly expressed genes to align their nucleotide 
sequences and to identify signature sequences around transcription/ translation regulatory regions that deter- 
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mine expression of the target genes. 

b) Identifying conserved domains in the highly expressed genes, as identified in step (a) in critical elements 
comprising minima! promoter, domain I (sub domains a, b and c), domain II (sub domains a, b, c and d), 
domain III, region between transcription start and TATA, transcription start site, 5' untranslated leader, transla- 

s tional initiation codon ATG context and N-terminal amino acids . 

c) Designing a unique nucleotide sequence to construct an artificial synthetic promoter by placing identified 
critical elements as given in step (b) above in a logical sequence as depicted in SEQ ID. NO 1 to achieve the 
desired level of expression of a reporter or target gene. 

d) Carrying out synthesis of the promoter DNA as obtained in step (c) above by synthesising overlapping oligos 
10 of the said promoter SEQ ID NO. 1 , assembling the said oligos into double stranded DNA and cloning of said 

promoter in front of a reporter gene or a targeted gene in a suitable vector selected for expression. 



14. A method as claimed in claim 13 wherein the organisms used for high level expression of the targeted genes are 
selected from plants or different parts of plants including leaves, stems, roots and also in different phylla and spe- 

is cies i.e. dicots , monocots, tobacco, cotton, cabbage, potato and other lower phylla such as bacteria and other 
prokaryotes. 

15. A method as claimed in claim 13 wherein the expression of the targeted gene from the said promoter can be 
achieved in both transient as well as in stable transgenic organisms in ail parts of plants, including roots, stem, 

20 leaves, storage tissue like stem of cabbage and tubers of potato etc.. 



16. A method as claimed in claim 13 further comprises of non specific, tissue specific, constitutive or inducible expres- 
sion of aid A gene or other target genes in transient assay or in stable transgenic organisms from an artificial pro- 
moter designed for expression in targetted pattern or plant part. 

17. A method for testing the high level expression in a plant using chemically synthestsed artificial promoter of SEQ ID 
NO. 1 comprising Polyethylene glycol(PEG) mediated transformation of plant protoplasts as well as biolistic medi- 
ated transformation of plant tissues including root, stem, intact leaf tissue followed by transient GUS assay to com- 
pare with natural CaMV 35S promoter showing the desired level of activity. 

1 8. A method as claimed in claim 1 7 wherein the increase in the relative level of activity may depend on the plant spe- 
cies or the type of explant used for the said purpose. 



19. A method as claimed in claim 16 wherein the test plant used as a reference plant is plant of tobacco, cotton, cab- 
35 bage, potato or any other species and the test part is root, shoot, leaf, storage tissue or any other tissue. 
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FIG.l DESIGNING OF OVERLAPPING OLIGOS FOR SYNTHESIS 

ARTIFICIAL PROMOTER 



1 GTCG ACC ATC ATTTG AAAGGGCCTCGGT AATACC ATTGTGG AAAAAGTTG 

CAGCTGGTAGTAAACTTTCC C^GAGCCATTATGGTAACACCTTTTTC.^AC 
— __ ^ 

5 i GTAATACGGAAAAAGAAGATTCATCATCCAGAAAAGGTGTGGAAAAGTTG 
C ATT ATG C C TTT TTC TyT T AAGT AGTAGGTC TTTTCC AC AC C TTTTC AA C 

101 TGG ATTGCGTGG AAAAAGTTCGATCTG ACC ATCTCTAGATC GTCG AAAAA 
ACCTAACGCA CCTTTTTCA AGCTAGACTGGTAGAGATCTAGCACCTTTTT 



1 5 I GTTCACGTAAGCGCTTACGT AC ATATGTGG ATTGTGG AAAAAGAAGACGG 

CAAGTGCATTCGCGAATGCATGTATACACCTAACACCTTTTTCTTCTGCC 




201 AGGCATCGGTGGAAAAAGAAGCTTGTACGCTGTACGCTGACGATAGATAG 
TCCGTAGCCACCTTTTTCTTCG AACATGCGAC A^GCGACTGCTATCTATC 



251 AT A C ACG TG C AC GC G TC C AC TTG ACGC AC AATTG ACG C AC AA TG AC GC C A 
TATGTGCACGTGCGCAGGTGAACTGCGTGTT AACTG C^TGTTACTGCGGT 



3 01 CTTGACGCTACTTC ACTATATATAGGAAGTTC ATTTC ATTTGGATTGGAC 

GAACTGCCATGAAGTGATATATATCCTTCAAGTAAAGTAAACCTAACCTG 
■ — 



3 51 ACGTGTTCTCATTTCTCAACAATTACCAACAACAACAAACAACAAACAAC 

TGCACAACAGTAAAGAGTTGTTAATGGTTGTTGTTGTTTGTTGTTTGTTG 
. ^ 

401 ATTATACAATTACTATT^ACAATTACATCTAGAT 
TAATATGTTAATGATAAATGTTAATGTAGATCTA 
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FIG. 2 RESTRICTION ENZYME SITES IN ARTIFICIAL SYNTHETIC PROMOTER 



SA f 

ac f 

lc i 

II I 
GTCGACCATCATTTGAAAGGGCCTCGGTAATACCATTGTCGA^ 

25 50 

X X 
m b 
n a 
I I 
TCATCATCCAGAAAAGGTGTGGAAAAGTTGTGGATTGCGTGGAAAAAGTTCGATCTGACCATCTCTAGAT 
75 100 125 

X S N 

m n d 

n a e 

I B I 

I I I 

CGTGGAAAAAGTTCACGTAAGCGCTTACGTACATATGTGGATTGTGGAAAAAGAAGACGGAGGCATCGGT 
150 175 200 

H MM 

i .If 

n u e 

<* II. 

3 I I 
GGAAAAAGAAGCTTGTACGCTGTACGCTGACGATAGATAGATACACGTGCACGCGTCCACTTGACGCACA 

225 250 275 



ATTGACGCACAATGACGCCACTTGACGCTACTTCACTATATATAGGAAGTTCATTTCATTTGGATTGGAC 
300 325 350 



ACGTGTTGTCATTTCTCAACAATTACCAACAACAACAAACAACAAACAACATTATACAATTACTATTTAC 

375 400 

X 
b 
a 
I 

AAT 7 AC ATCTAG AT 
425 
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FIG: 3 PRIMER FOR INTRODUCTION OF ATG 
CONTEXT IN ARTIFICIAL SYNTHETIC PROMOTER 



SEQ ID NO. 17 

5'AATTACATCTAGATAAACAATGGCTTCCTCCGTAGAAA 
CCCCAACCCGTGAAATCAAA 3' 
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